301 research outputs found

    Reverberation time estimation on the ACE corpus using the SDD method

    Full text link
    Reverberation Time (T60) is an important measure for characterizing the properties of a room. The author's T60 estimation algorithm was previously tested on simulated data where the noise is artificially added to the speech after convolution with a impulse responses simulated using the image method. We test the algorithm on speech convolved with real recorded impulse responses and noise from the same rooms from the Acoustic Characterization of Environments (ACE) corpus and achieve results comparable results to those using simulated data.Comment: In Proceedings of the ACE Challenge Workshop - a satellite event of IEEE-WASPAA 2015 (arXiv:1510.00383

    Source Coding in Networks with Covariance Distortion Constraints

    Get PDF
    We consider a source coding problem with a network scenario in mind, and formulate it as a remote vector Gaussian Wyner-Ziv problem under covariance matrix distortions. We define a notion of minimum for two positive-definite matrices based on which we derive an explicit formula for the rate-distortion function (RDF). We then study the special cases and applications of this result. We show that two well-studied source coding problems, i.e. remote vector Gaussian Wyner-Ziv problems with mean-squared error and mutual information constraints are in fact special cases of our results. Finally, we apply our results to a joint source coding and denoising problem. We consider a network with a centralized topology and a given weighted sum-rate constraint, where the received signals at the center are to be fused to maximize the output SNR while enforcing no linear distortion. We show that one can design the distortion matrices at the nodes in order to maximize the output SNR at the fusion center. We thereby bridge between denoising and source coding within this setup

    Binaural Speech Enhancement Using STOI-Optimal Masks

    Full text link
    STOI-optimal masking has been previously proposed and developed for single-channel speech enhancement. In this paper, we consider the extension to the task of binaural speech enhancement in which spatial information is known to be important to speech understanding and therefore should be preserved by the enhancement processing. Masks are estimated for each of the binaural channels individually and a `better-ear listening' mask is computed by choosing the maximum of the two masks. The estimated mask is used to supply probability information about the speech presence in each time-frequency bin to an Optimally-modified Log Spectral Amplitude (OM-LSA) enhancer. We show that using the proposed method for binaural signals with a directional noise not only improves the SNR of the noisy signal but also preserves the binaural cues and intelligibility.Comment: Accepted at IWAENC 202

    Graph neural networks for sound source localization on distributed microphone networks

    Full text link
    Distributed Microphone Arrays (DMAs) present many challenges with respect to centralized microphone arrays. An important requirement of applications on these arrays is handling a variable number of input channels. We consider the use of Graph Neural Networks (GNNs) as a solution to this challenge. We present a localization method using the Relation Network GNN, which we show shares many similarities to classical signal processing algorithms for Sound Source Localization (SSL). We apply our method for the task of SSL and validate it experimentally using an unseen number of microphones. We test different feature extractors and show that our approach significantly outperforms classical baselines.Comment: Presented as a poster at ICASSP 202

    End-to-End Classification of Reverberant Rooms using DNNs

    Get PDF
    Reverberation is present in our workplaces, our homes and even in places designed as auditoria, such as concert halls and theatres. This work investigates how deep learning can use the effect of reverberation on speech to classify a recording in terms of the room in which it was recorded in. Approaches previously taken in the literature for the task relied on handpicked acoustic parameters as features used by classifiers. Estimating the values of these parameters from reverberant speech involves estimation errors, inevitably impacting the classification accuracy. This paper shows how DNNs can perform the classification in an end-to-end fashion, therefore by operating directly on reverberant speech. Based on the above, a method for the training of generalisable DNN classifiers and a DNN architecture for the task are proposed. A study is also made on the relationship between feature-maps derived by DNNs and acoustic parameters that describe known properties of reverberation. In the experiments shown, AIRs are used that were measured in 7 real rooms. The classification accuracy of DNNs is compared between the case of having access to the AIRs and the case of having access only to the reverberant speech recorded in the same rooms. The experiments show that with access to the AIRs a DNN achieves an accuracy of 99.1% and with access only to reverberant speech, the proposed DNN achieves an accuracy of 86.9%. The experiments replicate the testing procedure used in previous work, which relied on handpicked acoustic parameters, allowing the direct evaluation of the benefit of using deep learning.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Dual input neural networks for positional sound source localization

    Full text link
    In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, such as the microphones' coordinates in space, to estimate the position of a sound source. We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network. We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture, a classical Least-Squares (LS) method as well as a classical Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings

    Adaptive inverse filtering of room acoustics

    Full text link
    Equalization techniques for high order, multichannel, FIR systems are important for dereverberation of speech observed in reverberation using multiple microphones. In this case the multichannel system represents the room impulse responses (RIRs). The existence of near-common zeros in multichannel RIRs can slow down the convergence rate of adaptive inverse filtering algorithms. In this paper, the effect of common and near-common zeros on both the closed-form and the adaptive inverse filtering algorithms is studied. An adaptive shortening algorithm of room acoustics is presented based on this study. 1
    corecore